storage: deflake TestStoreRangeMergeWatcher #31215

benesch · 2018-10-10T20:17:35Z

This test could deadlock if the LHS replica on store2 was shut down
before it processed the split at "b". Teach the test to wait for the LHS
replica on store2 to process the split before blocking Raft traffic to
it.

Fixes #31096.
Fixes #31149.
Fixes #31160.
Fixes #31167.

Release note: None

cockroach-teamcity · 2018-10-10T20:17:41Z

This change is

This test could deadlock if the LHS replica on store2 was shut down before it processed the split at "b". Teach the test to wait for the LHS replica on store2 to process the split before blocking Raft traffic to it. Fixes cockroachdb#31096. Fixes cockroachdb#31149. Fixes cockroachdb#31160. Fixes cockroachdb#31167. Release note: None

tbg

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale)

benesch · 2018-10-10T21:32:15Z

bors r=tschottdorf

@nvanbenschoten

31013: kv: try next replica on RangeNotFoundError r=nvanbenschoten,bdarnell a=tschottdorf Previously, if a Batch RPC came back with a RangeNotFoundError, we would immediately stop trying to send to more replicas, evict the range descriptor, and start a new attempt after a back-off. This new attempt could end up using the same replica, so if the RangeNotFoundError persisted for some amount of time, so would the unsuccessful retries for requests to it as DistSender doesn't aggressively shuffle the replicas. It turns out that there are such situations, and the election-after-restart roachtest spuriously hit one of them: 1. new replica receives a preemptive snapshot and the ConfChange 2. cluster restarts 3. now the new replica is in this state until the range wakes up, which may not happen for some time. 4. the first request to the range runs into the above problem @nvanbenschoten: I think there is an issue to be filed about the tendency of DistSender to get stuck in unfortunate configurations. Fixes #30613. Release note (bug fix): Avoid repeatedly trying a replica that was found to be in the process of being added. 31187: roachtest: add synctest r=bdarnell a=tschottdorf This new roachtest sets up a charybdefs on a single (Ubuntu) node and runs the `synctest` cli command against a nemesis that injects random I/O errors. The synctest command is new. It simulates a Raft log and can be directed at a filesystem that is being hit with random failures. The workload essentially writes ascending keys (flushing each one to disk synchronously) until an I/O error occurs, at which point it re-opens the instance to verify that all persisted writes are still there. If the RocksDB instance was permanently corrupted, it switches to a new, pristine, directory. This is used in the roachtest, but is also useful for manual use in user deployments in which we suspect there is a failure to persist data to disk. This hasn't found anything, but it's fun to watch and also shows us a number of errors that we know and love from sentry. Release note: None 31215: storage: deflake TestStoreRangeMergeWatcher r=tschottdorf a=benesch This test could deadlock if the LHS replica on store2 was shut down before it processed the split at "b". Teach the test to wait for the LHS replica on store2 to process the split before blocking Raft traffic to it. Fixes #31096. Fixes #31149. Fixes #31160. Fixes #31167. Release note: None 31217: importccl: add explicit default to mysql testdata timestamp r=dt a=dt this makes the testdata work on mysql 8.0.2+, where the timestamp type no longer has the implicit defaults. Release note: none. 31221: cluster: Create final cluster version for 2.1 r=bdarnell a=bdarnell Release note: None Co-authored-by: Tobias Schottdorf <tobias.schottdorf@gmail.com> Co-authored-by: Nikhil Benesch <nikhil.benesch@gmail.com> Co-authored-by: David Taylor <tinystatemachine@gmail.com> Co-authored-by: Ben Darnell <ben@bendarnell.com>

craig · 2018-10-10T22:56:06Z

Build succeeded

GitHub CI (Cockroach)

benesch requested review from bdarnell, tbg and a team October 10, 2018 20:17

benesch force-pushed the deflake-watcher branch from 6df8e35 to 1c7b427 Compare October 10, 2018 20:42

tbg approved these changes Oct 10, 2018

View reviewed changes

tbg mentioned this pull request Oct 10, 2018

teamcity: failed test: TestStoreRangeMergeWatcher #31224

Closed

craig bot merged commit 1c7b427 into cockroachdb:master Oct 10, 2018

tbg mentioned this pull request Oct 11, 2018

backport-2.1: storage: deflake TestStoreRangeMergeWatcher #31248

Merged

benesch deleted the deflake-watcher branch October 15, 2018 15:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage: deflake TestStoreRangeMergeWatcher #31215

storage: deflake TestStoreRangeMergeWatcher #31215

benesch commented Oct 10, 2018

cockroach-teamcity commented Oct 10, 2018

tbg left a comment

benesch commented Oct 10, 2018

craig bot commented Oct 10, 2018

storage: deflake TestStoreRangeMergeWatcher #31215

storage: deflake TestStoreRangeMergeWatcher #31215

Conversation

benesch commented Oct 10, 2018

cockroach-teamcity commented Oct 10, 2018

tbg left a comment

Choose a reason for hiding this comment

benesch commented Oct 10, 2018

craig bot commented Oct 10, 2018

Build succeeded